Neutrophil Subclustering

Surgical Control Samples

For this analysis, I converted the raw counts into counts per million (CPM) and filtered out genes with an average CPM less than 1, across all surgical control samples. This is a fairly permissive inclusion criteria, and it left us with 10,790 “expressed” genes. PCA was then performed on the \(log(\text{CPM}+1)\) values.

These samples mostly cluster by timepoint, with clear “A”, “B”, and “C” clusters. However, samples “SC7C” and “SC8C” clearly cluster with the “A” group. Additionally, there is a group of 3 samples that sit somewhere in between the “A” and “B” clusters. They’re a bit too spread out for me to call them a fourth cluster, but they’re hard to classify. The PCA plot below is a reasonably good representation of the data, explaining 56% of the overall variance (36.7%, 13.3%, and 6.1% respectively).

[1] "Samples:"
 [1] "SC1" "SC1" "SC1" "SC2" "SC2" "SC2" "SC3" "SC3" "SC3" "SC4" "SC4" "SC4"
[13] "SC5" "SC5" "SC5" "SC6" "SC6" "SC6" "SC7" "SC7" "SC7" "SC8" "SC8" "SC8"

Timepoint A = green

Timepoint B = red

Timepoint C = purple

I tried a few different clustering methods and there was no consensus on how to classify the three intermediate samples (“SC4A”, “SC8A”, and “SC1B”).

Here, I used a consensus clustering approach and the intermediate samples clustered with the “B” group. Colors in the heatmap represent the proportion of times a pair of samples clustered together, with darker being more often.

Here, I used a Gaussian mixture model (on the top 4 PCs) and the intermediate samples clustered with the “A” group. Colors in the heatmap represent distances between pairs of samples, with darker being farther apart.

I slightly prefer the second method, because it chose the number of clusters based on BIC (selecting between 1-6 clusters) and with fewer samples, I think a parametric model makes sense.

Sepsis Samples

I don’t think there’s anything super interesting going on, here. You said your prior analysis indicated that this was kind of a mess and I tend to agree. In the PCA plot, the top 3 PCs explain 26.9%, 20.4%, and 14.3% of the variance, respectively.

MMP8+ = black

MMP8- = red

I ran the same clustering methods as above, though with only 12 samples, I definitely think the Gaussian mixture model is more appropriate, so I’m only showing those results. That said, it picked 3 clusters, so I don’t really trust it that much. Based on the heatmap of pairwise distances, there does not seem to be much structure in the data.